chore: set `ensure_ascii=False` for json serialization to preserve unicode chars #1386

gfx · 2025-10-01T04:38:08Z

In the langfuse web UI, output is encoded in unicode escape sequence (\uXXX). This is the default behavior of the python's json.dumps().

It's annoying for non-English poeple like me, so set ensure_ascii=False for json.dumps().

Important

Set ensure_ascii=False in JSON serialization to preserve Unicode characters and added tests to verify this behavior.

Behavior:
- Set ensure_ascii=False in json.dumps() in _serialize() in attributes.py, _next() and _get_item_size() in score_ingestion_consumer.py, and post() in request.py to preserve Unicode characters.
Testing:
- Added test_unicode_serialization.py to verify Unicode characters are preserved in serialized output.

^{This description was created by}^{for 500db8d. You can customize this summary. It will automatically update as commits are pushed.}

Disclaimer: Experimental PR review

Greptile Overview

Updated On: 2025-10-01 04:39:28 UTC

Summary

This pull request improves Unicode character handling in the Langfuse Python SDK by adding `ensure_ascii=False` to all `json.dumps()` calls throughout the codebase. The change affects three critical serialization points: the OpenTelemetry attributes serializer (`_serialize` function in `attributes.py`), the HTTP client for API requests (`request.py`), and the score ingestion consumer for batch processing (`score_ingestion_consumer.py`).

By default, Python's json.dumps() escapes non-ASCII characters as Unicode sequences (e.g., \u3053\u3093\u306b\u3061\u306f instead of こんにちは), making output unreadable for non-English users in the Langfuse web UI. This change preserves Unicode characters in their native form, significantly improving the user experience for international users working with Japanese, Chinese, Korean, Arabic, Russian, and other non-Latin scripts.

The PR includes a comprehensive test file (test_unicode_serialization.py) that validates Unicode preservation across multiple writing systems and emoji. The change is backward-compatible as the resulting JSON remains valid, and the modification is applied consistently across all serialization points to ensure uniform behavior throughout the SDK.

Important Files Changed

Changed Files

Filename	Score	Overview
`langfuse/_client/attributes.py`	5/5	Added `ensure_ascii=False` to `json.dumps()` in the `_serialize` function used for OpenTelemetry span attributes
`langfuse/_utils/request.py`	5/5	Added `ensure_ascii=False` to `json.dumps()` in the HTTP client used for all API requests to Langfuse
`langfuse/_task_manager/score_ingestion_consumer.py`	5/5	Added `ensure_ascii=False` to two `json.dumps()` calls in the score ingestion batch processing pipeline
`tests/test_unicode_serialization.py`	5/5	New comprehensive test file validating Unicode character preservation across multiple languages and emoji

Confidence score: 5/5

This PR is safe to merge with minimal risk as it only changes JSON serialization format without affecting data structures or API contracts
Score reflects the backward-compatible nature of the change and comprehensive test coverage for Unicode handling
No files require special attention as all changes are straightforward serialization parameter additions

Sequence Diagram

sequenceDiagram
    participant User
    participant LangfuseClient as "Langfuse Client"
    participant EventSerializer as "Event Serializer"
    participant JSONEncoder as "JSON Encoder"
    participant APIEndpoint as "API Endpoint"

    User->>LangfuseClient: "serialize data with unicode content"
    LangfuseClient->>EventSerializer: "_serialize(data)"
    EventSerializer->>JSONEncoder: "json.dumps(obj, cls=EventSerializer, ensure_ascii=False)"
    JSONEncoder-->>EventSerializer: "serialized json string with preserved unicode"
    EventSerializer-->>LangfuseClient: "unicode-preserved json string"
    
    User->>LangfuseClient: "batch_post(**kwargs)"
    LangfuseClient->>EventSerializer: "json.dumps(kwargs, cls=EventSerializer, ensure_ascii=False)"
    EventSerializer-->>LangfuseClient: "serialized data with unicode preserved"
    LangfuseClient->>APIEndpoint: "POST request with unicode content"
    APIEndpoint-->>LangfuseClient: "response"
    LangfuseClient-->>User: "response"

    User->>LangfuseClient: "upload score events"
    LangfuseClient->>EventSerializer: "serialize events with unicode"
    EventSerializer->>JSONEncoder: "json.dumps(event, cls=EventSerializer, ensure_ascii=False)"
    JSONEncoder-->>EventSerializer: "unicode-preserved serialization"
    EventSerializer-->>LangfuseClient: "serialized events"
    LangfuseClient->>APIEndpoint: "upload batch with unicode content"
    APIEndpoint-->>LangfuseClient: "upload response"
    LangfuseClient-->>User: "upload complete"

…ode chars

CLAassistant · 2025-10-01T04:38:16Z

All committers have signed the CLA.

greptile-apps

_{4 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

gfx · 2025-10-01T06:01:51Z

Ah, This is exactly the same as #1330. Closing.

chore: set ensure_ascii=False for json serialization to preserve unic…

500db8d

…ode chars

greptile-apps bot reviewed Oct 1, 2025

View reviewed changes

gfx closed this Oct 1, 2025

gfx deleted the gfx/do_not_encode_unicode_in_outputs branch October 1, 2025 06:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: set `ensure_ascii=False` for json serialization to preserve unicode chars #1386

chore: set `ensure_ascii=False` for json serialization to preserve unicode chars #1386

Uh oh!

gfx commented Oct 1, 2025 •

edited by greptile-apps bot

Loading

Uh oh!

CLAassistant commented Oct 1, 2025 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

gfx commented Oct 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chore: set ensure_ascii=False for json serialization to preserve unicode chars #1386

chore: set ensure_ascii=False for json serialization to preserve unicode chars #1386

Uh oh!

Conversation

gfx commented Oct 1, 2025 • edited by greptile-apps bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Disclaimer: Experimental PR review

Greptile Overview

Summary

Important Files Changed

Confidence score: 5/5

Sequence Diagram

Uh oh!

CLAassistant commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

gfx commented Oct 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chore: set `ensure_ascii=False` for json serialization to preserve unicode chars #1386

chore: set `ensure_ascii=False` for json serialization to preserve unicode chars #1386

gfx commented Oct 1, 2025 •

edited by greptile-apps bot

Loading

CLAassistant commented Oct 1, 2025 •

edited

Loading